Logos et Littera – Journal of Interdisciplinary Approaches to Text
ISSN: 2336-9884
Issue 11 – December 2025
The decision to work with four essays was deliberate. Rather than drawing on synthetic datasets or
contrived writing samples, we sought authentic undergraduate work that reflected the variety of styles
instructors encounter in real assignments. The selected essays came from English-language course-
work completed at Pan-European University “Apeiron” in Banja Luka, Bosnia and Herzegovina. They
were chosen to represent two distinct stylistic registers – two formal academic essays and two conver-
sational, narrative texts – on the premise that stylistic variation could influence how detection tools
classify writing. The formal essays displayed a structured argument, academic vocabulary, and a con-
ventional organization of ideas. The conversational ones incorporated personal voice, chronological
sequencing, and features typical of non-native English writers, such as simplified syntax, direct ad-
dress, and occasional grammatical quirks. By pairing these two registers, the study could explore
whether the same tools would react differently to polished academic prose compared to more relaxed
and personal styles.
All identifying information, including names, course details, and assignment codes, was removed prior
to analysis. Since these essays had been submitted for coursework under conditions that allowed
anonymized use for pedagogical and research purposes, they qualified as secondary data under insti-
tutional ethical standards. In this respect, the ethical considerations were not limited to the technical
process of anonymization. They also extended to reflection on how the findings could be responsibly
used or misused in academic contexts. Since AI-detection results may influence disciplinary action in
real classrooms, it was crucial to approach this project with caution, ensuring that outcomes were pre-
sented as illustrative rather than definitive. A formal ethics board review was not required for this study,
given its reliance on anonymized secondary material, but the research was nonetheless guided by
established norms of confidentiality, fairness, and proportionality.
To provide a reasonable cross-section of commonly used detection platforms, four AI-detection tools
were selected: Grammarly, QuillBot, BypassAI, and Phrasely. While each is known for its role in higher
education, they differ in detection methods, user interfaces, and the level of feedback provided. Gram-
marly integrates AI-detection within a broader grammar and style platform, often presenting results as
a probability score alongside editorial suggestions. QuillBot is primarily known for its paraphrasing
function but also offers an AI-detection feature, making it a hybrid tool that reflects common student-
facing software. BypassAI is notable for positioning itself in relation to detector evasion and counter-
detection algorithms, providing an interesting counterpoint to mainstream academic tools. Phrasely is
a more recent addition to the detection landscape, marketed as a stylistic and semantic analyzer. Se-
lecting this mix allowed us to observe a range of algorithmic behaviors rather than capturing near-
identical outputs from similar systems. It also enabled us to compare how commercial, pedagogical,
and counter-detection oriented tools behaved when confronted with the same material.
Each essay was processed individually through each of the four detection tools. To avoid introducing
variability from uncontrolled factors, all submissions were carried out in a single browser session, on
the same device, and under identical network conditions. The default settings of each tool were main-
tained so that results reflected a typical user experience, rather than optimized or experimental config-
urations. No reformatting, rephrasing, or other alterations were made to the original texts beyond anon-
ymization. The order of submissions was kept consistent across tools, and records were carefully main-
tained so that any later replication would follow the same procedural sequence.
For each submission, we recorded the primary classification – such as “AI-generated” or “likely human”,
along with any numerical probability scores, flagged segments, or explanatory notes provided by the
tool. These results were entered into a comparative spreadsheet, which contained fields for the essay
identifier, style category, detection tool, classification label, probability score (if available), and any
5
10.31902/LL.11.2025.2
© The Author